Fix LR scheduler behaviour with AMP #16229

milesial · 2023-01-03T14:54:09Z

What does this PR do?

When training when native AMP and a LR scheduler, we get this warning that indicates that a LR step has been taken when an optimizer step was skipped (expected at the beginning of the training with native AMP):

/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Fixes #16228 #5558

Does your PR introduce any breaking changes? If yes, please list them.

No

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

src/pytorch_lightning/core/module.py

milesial · 2023-01-03T20:24:07Z

In the process of fixing tests I discovered and fixed a bug where the scheduler wouldn't match its optimizer when multiple optimizers are instantiated with frequencies. Now the optimizers and schedulers match and alternate as they should, resetting the cycle every epoch.

milesial · 2023-01-09T08:41:31Z

@carmocca Ready for final review

tchaton · 2023-01-09T09:19:52Z

src/pytorch_lightning/loops/epoch/training_epoch_loop.py

            return
        active_optimizers = _get_active_optimizers(
-            self.trainer.optimizers, self.trainer.optimizer_frequencies, self.total_batch_idx
+            self.trainer.optimizers, self.trainer.optimizer_frequencies, self.batch_idx


Could you add a test to verify this works properly ?

I modified the third case of test_step_scheduling_for_multiple_optimizers_with_frequency so that it tests that

carmocca

Can you check the failing tests?

carmocca · 2023-01-10T16:14:48Z

setup.cfg

    cloud:Run the cloud tests for example
 filterwarnings =
    error::FutureWarning
+    error:Detected call of `lr_scheduler.step\(\)` before `optimizer.step\(\)`:UserWarning


I added this line so that our CI fails if this warning appears. This way it tests that your patch works as expected.

Thanks, but this also makes IPU tests fail, this PR is focused on GPU. Not sure where to fix the issue on IPUs

@milesial IMHO, we could just skip the IPU test failures as done in my old PR #11755 for now.

for more information, see https://pre-commit.ci

milesial · 2023-01-10T20:06:49Z

The way I fixed the tests/tests_pytorch/models/test_hooks.py::test_trainer_model_hook_system_fit[True-kwargs1] tests is very flaky, so I'd appreciate if someone more familiar with these tests comes up with a better fix.
edit: seems like it didn't even fix it...

milesial · 2023-01-10T20:08:40Z

I also modified native_amp.py in both lightning_fabric and pytorch_lightning. It doesn't seem like the lightning_fabric one is called for a typical workflow, so I don't know if it's the right way

Borda · 2023-04-25T15:48:17Z

Hi @Borda, I also encounter the same issue. Will this be merged?

Let me check what is missing here...

for more information, see https://pre-commit.ci

alimoezzi · 2023-05-23T13:26:06Z

Is this PR merged already? I'm still having this issue.

Borda · 2023-05-23T15:36:48Z

there were some failing tests, @milesial mind have a look?

gitguardian · 2023-11-18T08:20:47Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

codecov · 2023-11-18T08:49:13Z

Codecov Report

❌ Patch coverage is 28.57143% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 48%. Comparing base (1fc077b) to head (5bff40a).

❌ Your patch check has failed because the patch coverage (29%) is below the target coverage (50%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (48%) is below the target coverage (99%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (1fc077b) and HEAD (5bff40a). Click for more details.

HEAD has 378 uploads less than BASE

Flag BASE (1fc077b) HEAD (5bff40a)

cpu 108 27

python3.10 24 6

lightning_fabric 26 14

pytest 57 30

lightning 61 16

python3.11 24 6

python 12 3

python3.12 12 3

python3.12.7 36 9

pytorch2.1 12 0

pytest-full 54 0

pytorch_lightning 24 0

pytorch2.7 6 0

pytorch2.4.1 6 0

pytorch2.5.1 6 0

pytorch2.2.2 6 0

pytorch2.6 6 0

pytorch2.8 6 0

pytorch2.3 6 0

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #16229     +/-   ##
=========================================
- Coverage      87%      48%    -39%     
=========================================
  Files         269      266      -3     
  Lines       23656    23606     -50     
=========================================
- Hits        20633    11419   -9214     
- Misses       3023    12187   +9164

for more information, see https://pre-commit.ci

stale · 2025-04-26T20:48:31Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions.

milesial requested review from awaelchli, carmocca, tchaton and williamFalcon as code owners January 3, 2023 14:54

github-actions bot added the pl Generic label for PyTorch Lightning package label Jan 3, 2023

This was referenced Jan 3, 2023

Wrong LR scheduler behaviour when using AMP #16228

Closed

Mixed precision: scheduler and optimizer are called in the wrong order #5558

Open

carmocca reviewed Jan 3, 2023

View reviewed changes

src/pytorch_lightning/core/module.py Outdated Show resolved Hide resolved

milesial requested a review from justusschock as a code owner January 3, 2023 20:22

mergify bot added the has conflicts label Jan 4, 2023

milesial requested a review from carmocca January 6, 2023 08:40

milesial force-pushed the master branch from bde5d8f to 2539dba Compare January 9, 2023 08:39

mergify bot removed the has conflicts label Jan 9, 2023

tchaton reviewed Jan 9, 2023

View reviewed changes

awaelchli added community This PR is from the community optimization labels Jan 9, 2023

carmocca requested a review from Borda as a code owner January 10, 2023 16:13

carmocca reviewed Jan 10, 2023

View reviewed changes

milesial and others added 6 commits January 10, 2023 20:35

Fix LR scheduler behaviour with AMP

9d92a4a

[pre-commit.ci] auto fixes from pre-commit.com hooks

2c2b138

for more information, see https://pre-commit.ci

Fix LR schedulers when optimizers with frequencies

1b3365e

Move implementation to scales comparison

a237283

Catch warnings

183a6a6

Fix implementation

0406a55

milesial force-pushed the master branch from 4e71c28 to 0406a55 Compare January 10, 2023 20:03

[pre-commit.ci] auto fixes from pre-commit.com hooks

c44a892

for more information, see https://pre-commit.ci

mergify bot added the has conflicts label Apr 21, 2023

Lightning-AI deleted a comment from stale bot Apr 25, 2023

Borda assigned carmocca Apr 25, 2023

Merge branch 'master' into master

9d7b2b8

mergify bot removed the has conflicts label Apr 26, 2023

pre-commit-ci bot and others added 3 commits April 26, 2023 19:47

[pre-commit.ci] auto fixes from pre-commit.com hooks

b997075

for more information, see https://pre-commit.ci

Merge branch 'master' into master

930ba4c

Merge branch 'master' into master

13f5fb4

Merge branch 'master' into master

374e856

Merge branch 'master' into master

b900ded

mergify bot added the has conflicts label Feb 13, 2024

Merge branch 'master' into milesial/master

f10b897

mergify bot removed the has conflicts label Feb 16, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

965fc03

for more information, see https://pre-commit.ci

carmocca removed their assignment May 6, 2024

stale bot added the won't fix This will not be worked on label Apr 26, 2025

Merge branch 'master' into master

418b8d4

Borda requested review from ethanwharris and lantiga as code owners August 19, 2025 13:00

stale bot removed won't fix This will not be worked on labels Aug 19, 2025

Borda added 2 commits September 2, 2025 15:46

Merge branch 'master' into master

992de00

Merge branch 'master' into master

5bff40a

Borda added the waiting on author Waiting on user action, correction, or update label Sep 5, 2025

Fix LR scheduler behaviour with AMP #16229

Are you sure you want to change the base?

Fix LR scheduler behaviour with AMP #16229

Uh oh!

Conversation

milesial commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Uh oh!

Uh oh!

milesial commented Jan 3, 2023

Uh oh!

milesial commented Jan 9, 2023

Uh oh!

tchaton Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

milesial Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

carmocca left a comment

Choose a reason for hiding this comment

Uh oh!

carmocca Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

milesial Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

akihironitta Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

milesial commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milesial commented Jan 10, 2023

Uh oh!

Borda commented Apr 25, 2023

Uh oh!

alimoezzi commented May 23, 2023

Uh oh!

Borda commented May 23, 2023

Uh oh!

gitguardian bot commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

️✅ There are no secrets present in this pull request anymore.

Uh oh!

codecov bot commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stale bot commented Apr 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

milesial commented Jan 3, 2023 •

edited

Loading

milesial commented Jan 10, 2023 •

edited

Loading

gitguardian bot commented Nov 18, 2023 •

edited

Loading

codecov bot commented Nov 18, 2023 •

edited

Loading